AITopics | ideal transition function

8078e8c3055303a884ffae2d3ea00338-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 06:53:17 GMT

halfcheetah, implementation, transition function, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.31)

Add feedback

I2Q: A Fully Decentralized Q-Learning Algorithm

Neural Information Processing SystemsDec-24-2025, 15:02:06 GMT

Fully decentralized multi-agent reinforcement learning has shown great potentials for many real-world cooperative tasks, where the global information, \textit{e.g.}, the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.

decentralized q-learning algorithm, independent q-learning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Proof

Neural Information Processing SystemsAug-16-2025, 12:04:36 GMT

In Section 4.2, we have shown the effectiveness of In Section 3.4, we have analyzed that I2Q can easily solve the task with multiple optimal joint policies. Here, we give another way to solve this problem. D3G cannot obtain a winning rate in SMAC, as shown in Table 1. Although QSS value is a biased estimation in this implementation, the implementation without forward model is practical. The results are shown in Figure 16.

artificial intelligence, implementation, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.31)

Add feedback

8078e8c3055303a884ffae2d3ea00338-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 12:04:33 GMT

agent, international conference, transition probability, (10 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

I2Q: A Fully Decentralized Q-Learning Algorithm

Neural Information Processing SystemsJan-16-2025, 01:44:39 GMT

Fully decentralized multi-agent reinforcement learning has shown great potentials for many real-world cooperative tasks, where the global information, \textit{e.g.}, the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy.

decentralized q-learning algorithm, ideal transition function, independent q-learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback